In this paper we present a novel multi-attribute face manipulation method based on textual descriptions. Previous text-based image editing methods either require test-time optimization for each individual image or are restricted to single attribute editing. Extending these methods to multi-attribute face image editing scenarios will introduce undesired excessive attribute change, e.g., text-relevant attributes are overly manipulated and text-irrelevant attributes are also changed. In order to address these challenges and achieve natural editing over multiple face attributes, we propose a new decoupling training scheme where we use group sampling to get text segments from same attribute categories, instead of whole complex sentences. Further, to preserve other existing face attributes, we encourage the model to edit the latent code of each attribute separately via an entropy constraint. During the inference phase, our model is able to edit new face images without any test-time optimization, even from complex textual prompts. We show extensive experiments and analysis to demonstrate the efficacy of our method, which generates natural manipulated faces with minimal text-irrelevant attribute editing. Code and pre-trained model will be released.
translated by 谷歌翻译
细粒度识别的目的是成功区分具有微妙差异的动作类别。为了解决这个问题,我们从人类视觉系统中获得灵感,该系统包含大脑中专门用于处理特定任务的专业区域。我们设计了一个新型的动态时空专业化(DSTS)模块,该模块由专门的神经元组成,这些神经元仅针对高度相似的样品子集激活。在训练过程中,损失迫使专门的神经元学习判别性细粒差异,以区分这些相似的样品,从而改善细粒度的识别。此外,一种时空专业化方法进一步优化了专业神经元的架构,以捕获更多的空间或时间细粒信息,以更好地解决视频中各种时空变化的范围。最后,我们设计了上游下游学习算法,以优化训练过程中模型的动态决策,从而提高DSTS模块的性能。我们在两个广泛使用的细粒度识别数据集上获得了最先进的性能。
translated by 谷歌翻译
设备方向听到需要从给定方向的音频源分离,同时实现严格的人类难以察觉的延迟要求。虽然神经网络可以实现比传统的波束形成器的性能明显更好,但所有现有型号都缺乏对计算受限的可穿戴物的低延迟因果推断。我们展示了一个混合模型,将传统的波束形成器与定制轻质神经网络相结合。前者降低了后者的计算负担,并且还提高了其普遍性,而后者旨在进一步降低存储器和计算开销,以实现实时和低延迟操作。我们的评估显示了合成数据上最先进的因果推断模型的相当性能,同时实现了模型尺寸的5倍,每秒计算的4倍,处理时间减少5倍,更好地概括到真实的硬件数据。此外,我们的实时混合模型在为低功耗可穿戴设备设计的移动CPU上运行8毫秒,并实现17.5毫秒的端到端延迟。
translated by 谷歌翻译
Deep neural operators can learn nonlinear mappings between infinite-dimensional function spaces via deep neural networks. As promising surrogate solvers of partial differential equations (PDEs) for real-time prediction, deep neural operators such as deep operator networks (DeepONets) provide a new simulation paradigm in science and engineering. Pure data-driven neural operators and deep learning models, in general, are usually limited to interpolation scenarios, where new predictions utilize inputs within the support of the training set. However, in the inference stage of real-world applications, the input may lie outside the support, i.e., extrapolation is required, which may result to large errors and unavoidable failure of deep learning models. Here, we address this challenge of extrapolation for deep neural operators. First, we systematically investigate the extrapolation behavior of DeepONets by quantifying the extrapolation complexity via the 2-Wasserstein distance between two function spaces and propose a new behavior of bias-variance trade-off for extrapolation with respect to model capacity. Subsequently, we develop a complete workflow, including extrapolation determination, and we propose five reliable learning methods that guarantee a safe prediction under extrapolation by requiring additional information -- the governing PDEs of the system or sparse new observations. The proposed methods are based on either fine-tuning a pre-trained DeepONet or multifidelity learning. We demonstrate the effectiveness of the proposed framework for various types of parametric PDEs. Our systematic comparisons provide practical guidelines for selecting a proper extrapolation method depending on the available information, desired accuracy, and required inference speed.
translated by 谷歌翻译
在本文中,我们提出了一种新颖的自我监督方法,可以预测未来,未观察到的现实世界中的深度估计。这项工作是第一个探索自我监督的学习,以估计视频未来未观察到的框架的单眼深度。现有作品依靠大量带注释的样本来生成对看不见框架深度的概率预测。但是,由于需要大量注释的视频样本,因此这使它变得不现实。此外,案件的概率性质,其中一个过去可能会有多个未来结果通常会导致深度估计不正确。与以前的方法不同,我们将未观察到的框架的深度估计作为视图合成问题进行建模,该问题将看不见的视频框架的深度估计视为辅助任务,同时使用学识渊博的姿势将视图恢复回去。这种方法不仅具有成本效益 - 我们不使用任何基础真相深度进行培训(因此实用),而且不使用确定性(过去的框架映射到不久的将来)。为了解决此任务,我们首先开发了一个新颖的深度预测网络DEFNET,该深度通过预测潜在特征来估计未观察到的未来的深度。其次,我们开发了基于渠道注意的姿势估计网络,该网络估计未观察到的框架的姿势。使用这个学到的姿势,将估计的深度图重建回图像域,从而形成一个自我监督的解决方案。我们提出的方法在短期和中期预测环境中与最先进的替代方案相比,ABS REL度量的重大改善,在Kitti和CityScapes上标有标准。代码可从https://github.com/sauradip/depthforecasting获得
translated by 谷歌翻译
With its capability to deal with graph data, which is widely found in practical applications, graph neural networks (GNNs) have attracted significant research attention in recent years. As societies become increasingly concerned with the need for data privacy protection, GNNs face the need to adapt to this new normal. Besides, as clients in Federated Learning (FL) may have relationships, more powerful tools are required to utilize such implicit information to boost performance. This has led to the rapid development of the emerging research field of federated graph neural networks (FedGNNs). This promising interdisciplinary field is highly challenging for interested researchers to grasp. The lack of an insightful survey on this topic further exacerbates the entry difficulty. In this paper, we bridge this gap by offering a comprehensive survey of this emerging field. We propose a 2-dimensional taxonomy of the FedGNNs literature: 1) the main taxonomy provides a clear perspective on the integration of GNNs and FL by analyzing how GNNs enhance FL training as well as how FL assists GNNs training, and 2) the auxiliary taxonomy provides a view on how FedGNNs deal with heterogeneity across FL clients. Through discussions of key ideas, challenges, and limitations of existing works, we envision future research directions that can help build more robust, explainable, efficient, fair, inductive, and comprehensive FedGNNs.
translated by 谷歌翻译
我们介绍了第一个单次个性化素描细分方法。我们的目标是分割属于与单个草图的相同类别的所有草图,其中包含给定部分注释,而(i)保留在示例中嵌入的零件语义,并且(ii)稳健地输入样式和抽象。我们将此方案称为个性化。因此,我们重要地为下游细粒度素描分析任务提供了绝望的个性化能力。要培训强大的分割模块,我们将示例草图对同一类别的每个可用草图进行了变形。我们的方法推广到培训期间未观察到的草图。我们的中央贡献是特定于草图的层级变形网络。给定通过图形卷积网络获得的多级草图笔划编码,我们的方法估计从对上层的参考的刚体变换。通过冲程明智的变形进一步通过较低水平进一步获得从示例到全球翘曲的参考素描的更精细的变形。两个变形水平都是通过在没有监督的情况下学习的关键点之间的平均平方距离引导,确保中风语义被保留。我们评估我们对最先进的分割和感知分组基线的方法,为单次设置和两次射击3D形状分割方法重新设计。我们表明,我们的方法平均超过10%的所有替代品。消融研究进一步证明我们的方法对个性化是强大的:输入部分语义和风格差异的变化。
translated by 谷歌翻译
超级分辨率(SR)是低级视觉区域的基本和代表任务。通常认为,从SR网络中提取的特征没有特定的语义信息,并且网络只能从输入到输出中学习复杂的非线性映射。我们可以在SR网络中找到任何“语义”吗?在本文中,我们为此问题提供了肯定的答案。通过分析具有维度降低和可视化的特征表示,我们成功地发现了SR网络中的深度语义表示,\ Texit {i.},深度劣化表示(DDR),其与图像劣化类型和度数相关。我们还揭示了分类和SR网络之间的表示语义的差异。通过广泛的实验和分析,我们得出一系列观测和结论,对未来的工作具有重要意义,例如解释低级CNN网络的内在机制以及开发盲人SR的新评估方法。
translated by 谷歌翻译
Discovering governing equations of a physical system, represented by partial differential equations (PDEs), from data is a central challenge in a variety of areas of science and engineering. Current methods require either some prior knowledge (e.g., candidate PDE terms) to discover the PDE form, or a large dataset to learn a surrogate model of the PDE solution operator. Here, we propose the first solution operator learning method that only needs one PDE solution, i.e., one-shot learning. We first decompose the entire computational domain into small domains, where we learn a local solution operator, and then we find the coupled solution via either mesh-based fixed-point iteration or meshfree local-solution-operator informed neural networks. We demonstrate the effectiveness of our method on different PDEs, and our method exhibits a strong generalization property.
translated by 谷歌翻译
我们研究有限的时间范围连续时间线性季节增强学习问题,在情节环境中,控制器的状态和控制系数都不清楚。我们首先提出了基于连续时间观察和控件的最小二乘算法,并建立对数的对数遗憾,以$ o((\ ln m)(\ ln \ ln m))$,$ m $是数字学习情节。该分析由两个部分组成:扰动分析,这些分析利用了相关的riccati微分方程的规律性和鲁棒性;和参数估计误差,依赖于连续的最小二乘估计器的亚指数属性。我们进一步提出了一种基于离散时间观察和分段恒定控制的实际实现最小二乘算法,该算法根据算法中使用的时间步骤明确地取决于额外的术语,从而实现相似的对数后悔。
translated by 谷歌翻译